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Abstract 

IMGT, the international ImMunoGeneTics information system® (http://imgt.cines.fr), is a high quality 
integrated knowledge resource specializing in immunoglobulins (IG), T eel! receptors (TR) t major 
histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other 
vertebrates, created in 1989, by the Laboratoire d'lmmunoGenetique Moieculaire LIGM. IMGT provides a 
common access to standardized data which include nucleotide and protein sequences, oligonucleotide 
primers, gene maps, genetic polymorphisms, specificities, 2D and 3D structures. IMGT consists of 
several sequence databases {IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB), one genome 
database (IMGT/GENE-DB) and one three-dimensional structure database {IMGT/3Dstructure-DB), 
interactive tools for sequence analysis (IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/PhyloGene, 
IMGT/Aliele-Align), for genome analysis (IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView) and for 
3D structure analysis (IMGT/StructuralQuery), and Web resources ("IMGT Marie-Paule page") 
comprising 8000 HTML pages. IMGT other accesses include SRS, FTP, search by BLAST, etc. By its 
high quality and its easy data distribution, IMGT has important implications in medical research 
(repertoire in autoimmune diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research, 
genome diversity and genome evolution studies of the adaptive immune responses, biotechnology 
related to antibody engineering (scFv, phage displays, combinatorial libraries) and therapeutical 
approaches (grafts, immunotherapy). 

IMGT is freely available at http://imgt.cines.fr. 

Key words: IMGT, ontology, database, information system, knowledge resource, immunoinformatics, 
immunogenetics, antibody, immunoglobulin, T cell receptor, immunoglobulin superfamily, leukemia, 
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lymphoma, MHC, HLA, Collier de Pedes, three-dimensional (3D) structure, primer, polymorphism. 



Introduction 

The molecular synthesis and genetics of the immunoglobulin (IG) and T cell receptor (TR) chains is 
particularly complex and unique as it includes biological mechanisms such as DNA molecular 
rearrangements in multiple loci (three for IG and four for TR in humans) located on different 
chromosomes (four in humans), nucleotide deletions and insertions at the rearrangement junctions (or N- 
diversity), and somatic hypermutations in the IG loci (for review, see Lefranc and Lefranc, 2001a; 2001b). 
The number of potential protein forms of IG and TR is almost unlimited. Owing to the complexity and high 
number of published sequences, data control and classification and detailed annotations are a very 
difficult task for the generalist databanks such as EMBL (UK) [Stoesser et a/., 2003]. GenBank (USA) 
[Benson et a/., 2003] and DDBJ (Japan) [Miyazaki et a/., 2003]. These observations were the starting 
point of IMGT, the international ImMunoGeneTics information system® [Lefranc, 2003a], created in 1989, 
by the Laboratoire d'lmmunoGenetique Moleculaire (LIGM) (Universite Montpeilier II and CNRS) at 
Montpellier, France. 

IMGT is a high quality integrated knowledge resource specializing in IG, TR, major histocompatibility 
complex (MHC) and related proteins of the immune systems (RPI) of human and other vertebrate species 
[Giudiceili era/.. 1997; Lefranc era/., 1998; 1999; Ruiz eta/., 2000; Lefranc, 2001a; 2002; 2003a; 2003bJ. 
IMGT consists of several sequence databases (IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB, 
IMGT/PROTEIN-DB, this last one in development), one genome database (IMGT/GENE-DB) and one 
three-dimensional 3D structure database (IMGT/3Dstructure-DB), interactive tools for sequence analysis 
(IMGT/V-QUEST, IMGT/JunctionAnalysis, IMGT/PhyloGene, IMGT/Allele-Align), for genome analysis 
(IMGT/GeneSearch, IMGT/GeneView, IMGT/LocusView) and for 3D structure analysis 
(IMGT/StructuralQuery), and Web resources ("IMGT Marie-Paule page") comprising 8000 HTML pages 
which include IMGT Scientific chart, IMGT Repertoire (for IG and TR, MHC, RPI), IMGT Bloc-notes, 
IMGT Education and IMGT Index. IMGT other accesses include SRS, FTP, search by BLAST, etc. By its 
high quality and its easy data distribution, IMGT has important implications in medical research 
(repertoire in normal and pathological situations: autoimmune diseases, infectious diseases, AIDS, 
detection of residual diseases in leukemias, lymphomas, myelomas), veterinary research, genome 
diversity and genome evolution studies of the adaptive immune responses, biotechnology related to 
antibody engineering (single chain Fragment variable (scFv), phage displays, combinatorial libraries) and 
therapeutical approaches (grafts, immunotherapy). IMGT is freely available at http://imgt.cines.fr. 



IMGT-ONTOLOGY 

IMGT has developed a formal specification of the terms to be used in the domain of immunogenetics and 
immunoinformatics to ensure accuracy, consistency and coherence in IMGT. This has been the basis of 
IMGT-ONTOLOGY [Giudiceili and Lefranc, 1999], the first ontology in the domain, which allows the 
management of the immunogenetics knowledge for human and other vertebrate species. IMGT- 
ONTOLOGY comprises five main concepts: IDENTIFICATION, CLASSIFICATION, DESCRIPTION, 
NUMEROTATION and OBTENTION {Giudiceili and Lefranc, 1999]. Standardized keywords, standardized 
sequence annotation, standardized IG and TR gene nomenclature, the IMGT unique numbering, and 
standardized origin/methodology were defined, respectively, based on these five main concepts. The 
controlled vocabulary and the annotation rules for data and knowledge management of the IG, TR, MHC 
and RPI of human and other vertebrate species constitute the IMGT Scientific chart. All IMGT data are 
expertly annotated according to the IMGT Scientific chart. IMGT is the global internationally 
acknowledged reference in immunogenetics and immunoinformatics. 
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The IDENTIFICATION concept: standardized keywords 

IMGT standardized keywords have been assigned to all IMGT/LIGM-DB entries. They include (i) general 
keywords: indispensable for the sequence assignments, they are described in an exhaustive and non 
redundant list, and are organized in a tree structure, and (ii) specific keywords: they are more specifically 
associated with particularities of the sequences (orphon, transgene, etc.) or with diseases (leukemia, 
lymphoma, myeloma, etc.) [Giudicelli et a/., 1997]. The list is not definitive and new specific keywords can 
easily be added if needed. 



The DESCRIPTION concept: standardized labels and annotations 

Two hundred fifteen feature labels are necessary in IMGT/LIGM-DB to describe all structural and 
functional subregions that compose IG and TR sequences [Giudicelli ef at., 1997], whereas only seven of 
them are available in EMBL, GenBank or DDBJ. Annotation of sequences with these labels constitutes 
the main part of the expertise. Levels of annotation have been defined, which allow the users to query 
sequences in IMGT/LIGM-DB even though they are not fully annotated [Giudicelli et et a/., 1997]. An 
internal tool, IMGT/Automat, has been developed to automatically perform the annotation of the 
rearranged cDNA sequences in IMGT/LIGM-DB [Giudicelli ef a/., 2003]. One hundred seventy two 
additional labels were defined for IG, TR, MHC and RPI amino acid sequences and domain structures in 
IMGT/PROTEIN-DB and IMGT/3Dstructure-DB. Prototypes represent the organizational relationship 
between labels and give information on the order and expected length (in number of nucleotides) of the 
labels [Giudicelli et a/., 1997; Lefranc ef a/., 1999]. Prototype can apply to general configuration of IG, TR 
or MHC, independently of the chain type, the species or any other parameters like'functtonality. However, 
prototypes may also be established for very precise cases when sequence characteristics are clearly 
established. 



The CLASSIFICATION concept: standardized IG and TR gene nomenclature 

The objective is to provide immunologists and geneticists with a standardized nomenclature per locus 
and per species which will allow extraction and comparison of data for the complex B and T cell antigen 
receptor molecules. The CLASSIFICATION concept has been used to set up a unique nomenclature of 
human IG and TR genes, which was approved by the Human Genome Organization (HUGO) 
Nomenclature Committee (HGNC) in 1999 [Wain ef a/., 2002] and has become the community standard. 
The complete list of the human IG and TR gene names [Lefranc and Lefranc, 2001a; 2001b; Lefranc, 
2000a; 2000b; 2000c; 2000d; Lefranc, 2001b; 2001c; 2001d] has been entered by the IMGT 
Nomenclature Committee in the Genome DataBase GDB (Canada), LocusLink at NCBI (USA), and 
GeneCards. The complete list of the mouse IG and TR gene names was sent by IMGT, in July 2002, to 
the Mouse Genome Informatics MGI Mouse Genome Database MGD (USA), LocusLink at NCBI, and 
HGNC. Both lists are available from the IMGT site [Lefranc, 2003a] and queries on the human and mouse 
IG and TR gene classification and gene names can be made from IMGT/GENE-DB. IMGT reference 
sequences have been defined for each allele of each gene based on one or, whenever possible, several 
of the following criteria: germline sequence, first sequence published, longest sequence, mapped 
sequence [Lefranc ef at, 1999]. 



The NUMEROTATION concept: the IMGT unique numbering 

A uniform numbering system for IG and TR sequences of all species has been established to facilitate 
sequence comparison and cross-referencing between experiments from different laboratories whatever 
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the antigen receptor (IG or TR), the chain type, the domain (variable V or constant C), or the species 
[Lefranc, 1997; 1999; Lefranc ef a/., 2003]. This numbering results from the analysis of more than 5000 
IG and TR variable region sequences of vertebrate species from fish to human. It takes into account and 
combines the definition of the framework (FR) and complementarity determining regions (CDR) [Kabat, 
1991], structural data from X-ray diffraction studies [Satow et a!., 1986] and the characterization of the 
hypervariabie loops [Chothia and Lesk, 1987]. In the IMGT numbering, conserved amino acids from 
frameworks always have the same number whatever the IG or TR variable sequence, and whatever the 
species they come from (as examples: Cysteine 23 (in FR1), Tryptophan 41 (in FR2), Leucine 89 and 
Cysteine 104 (in FR3). Based on the IMGT unique numbering, standardized 2D graphical 
representations, designated as IMGT Colliers de Perles [Lefranc ef a/., 1999], are available in IMGT 
Repertoire. This IMGT unique numbering has several advantages: 

■ It has allowed the redefinition of the limits of the FR and CDR of the IG and TR variable regions 
[Lefranc and Lefranc, 2001a; 2001b] and domains [Ruiz and Lefranc, 2002]. The FR-IMGT and 
CDR-IMGT lengths become in themselves crucial information which characterize variable regions 
belonging to a group, a subgroup and/or a gene. 

■ Framework amino acids {and codons) located at the same position in different sequences can be 
compared without requiring sequence alignments. This also holds for amino acids belonging to CDR- 
IMGT of same length. 

■ The unique numbering is used as the output of the IMGT/V-QUEST alignment tool. The aligned 
sequences are displayed according to the IMGT numbering and with the FR-IMGT and CDR-IMGT 
delimitations. 

■ The unique numbering has allowed a standardization of the description of mutations and the 
description of IG and TR allele polymorphisms [Lefranc and Lefranc, 2001a; 2001b]. These 
mutations and allelic polymorphisms are described by comparison to the IMGT reference sequences 
of the alleles *01 [Lefranc, 1998]. 

■ The unique numbering allows the description and comparison of somatic hypermutations of the IG 
IMGT variable domains. 

By facilitating the comparison between sequences and by allowing the description of alleles and 
mutations, the IMGT unique numbering represents a big step forward in the analysis of the IG and TR 
variable region (V-REGION) sequences of all vertebrate species [Pommie ef ai, 2003] (IMGT Repertoire 
{IG and TR)). Moreover, it gives insight into the structural configuration of the variable domain (V- 
DOMAIN encoded by the V-J- or V-D-J-REGION) [Ruiz and Lefranc. 2002; Lefranc et a/., 2003]. The 
IMGT unique numbering opens interesting views on the evolution of the proteins belonging to the 
"immunoglobulin superfamily" (IgSF) [Williams and Barclay, 1988]. It has been applied with success to all 
the sequences of domains belonging to the IgSF V-set, designated as V-LIKE-DOMAINs in IMGT, which 
include non rearranging sequences in vertebrates (human CD4 [D1.D3], Xenopus CTXgl, etc.) and in 
invertebrates (drosophila amalgam, drosophila fasciclin II, etc.) [Lefranc, 1997; 1999; Lefranc ef a/., 2003] 
(IMGT Repertoire (RPI)). This standardized approach has been applied to the constant domain (C- 
DOMAIN) of the IG and TR ( IMGT Repertoire (IG and TR)), and to the C-LIKE-DOMAINs of proteins 
other than IG and TR ( IMGT Repertoire (RPI)). An IMGT unique numbering has also been implemented 
for the groove domain (G-DOMAIN) [Duprat et a/., 2003] of the MHC class I and II chains (IMGT 
Repertoire (MHC)), and for the G-LIKE-DOMAINs of proteins other than MHC (IMGT Repertoire (RPI)). 



The OBTENTION concept: standardized origin/methodology 

The OBTENTION concept is a set of standardized terms that precise the origins of the sequence (the 
'origin' concept) and the conditions in which the sequences have been obtained (the 'methodology' 
concept). The 'origin' concept comprises the subsets of 'cell, tissue or. organ', 'auto-immune diseases', 
'clonal expansion diseases' (such as leukemia, lymphoma, myeloma), whereas the 'methodology' concept 
comprises the subsets related to the 'hybridoma', to the experimental conditions (sequences amplified by 
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'PCR'), to the obtention from 'libraries' (genomic, cDNA, combinatorial, etc.) or from 'transgenic' 
organisms {animal, plant). At this stage of development, the exhaustive definition of the concepts of 
obtention and of their instances is still in progress. 



IMGT databases 

IMGT sequence databases: IMGT/LIGM-DB, IMGT/MHC-DB, IMGT/PRIMER-DB 

IMGT/LIGM-DB is the comprehensive IMGT database of IG and TR nucleotide sequences from human 
and other vertebrate species, with translation for fully annotated sequences, based on the 
IDENTIFICATION and DESCRIPTION concepts, created in 1989 by LIGM, Montpellier, France, on the 
Web since July 1995 [Giudicelli era/., 1997; Lefranc etal., 1998; 1999; Ruiz etal., 2000; Lefranc, 2001a; 
2002; 2003a; 2003b]. 

IMGT/LIGM-DB is the first and the largest database of IMGT, the international ImMunoGeneTics 
information system®. In November 2003, IMGT/LIGM-DB contained 78,500 nucleotide sequences of IG 
and TR from human and 150 other vertebrate species. 

IMGT/LIGM-DB sequence data are identified by the EMBL/GenBank/DDBJ accession number. The 
unique source of data for IMGT/LIGM-DB is EMBL which shares data with the other two generalist 
databases GenBank and DDBJ {IMGT/LIGM-DB Sequence submission). Once the sequences are 
allowed by the authors to be made public, LIGM automatically receives IG and TR sequences by e-mail 
from EMBL. After control by LIGM curators, data are scanned to store sequences, bibliographical 
references and taxonomic data, and standardized IMGT/LIGM-DB keywords are assigned to all entries. 
Based on expert analysis, specific detail annotations are added to IMGT flat files in a second step 
[Giudicelli ef a/., 1997]. Since August 1996, the IMGT/LIGM-DB content has closely followed that of the 
EMBL for the IG and TR, with the following advantages: IMGT/LIGM-DB contains IG and TR entries 
which have disappeared from the generalist databases (as examples: the L36092 accession number 
which encompasses the complete human TRB locus is still present in IMGT/LIGM-DB, whereas it has 
been deleted from EMBL/GenBank/DDBJ due to its too large size (684973 bp); in 1999, IMGT detected 
the disappearance of 20 IG and TR sequences which inadvertently had been lost by GenBank, and 
allowed the recuperation of these sequences in the generalist databases); conversely, IMGT/LIGM-DB 
does not contain sequences which have previously been wrongly assigned to IG and TR. 

The IMGT/LIGM-DB data, based on the DESCRIPTION concepts, are provided with a user friendly 
interface. The Web interface allows searches according to immunogenetic specific criteria and is easy to 
use without any knowledge in a computing language. The interface allows the users to get easily 
connected from any type of platform (PC, Macintosh, workstation) using freeware such as Explorer, 
Netscape. All IMGT/LIGM-DB information is available through five modules of search: Catalogue, 
Taxonomy and Characteristics, Keywords, Annotation labels and References. Selection is displayed at 
the top of the "results of your search" page, so the users can check their own queries [Lefranc et al., 
1999]. Users have the possibility to modify their request or to consult the results. They can (i) add new 
conditions to increase or decrease the number of resulting sequences, (ii) view details: selecting this 
"View" option provides a list of resulting sequences; selection of one sequence in the list offers nine 
possibilities: annotations, IMGT flat file, coding regions with protein translation, catalogue and external 
references, sequence in dump format, sequence in FASTA format, sequence with three reading frames, 
EMBL flat file, IMGT/V-QUEST, or (iii) search for sequence fragments: selecting this "Subsequences" 
option allows to search for sequence fragments (subsequences) corresponding to a particular label for 
the resulting sequences (available for fully annotated sequences) [Lefranc et a/., 1999]. 



http://www.bioinfo.de/isb/2003/04/0004/main.html 



28/11/2003 



IMGT-ONTOLOGY for immunogenetics and immunoinformatics 



Page 6 of 1 4 



IMGT/LIGM-DB data are also distributed by anonymous FTP servers at the Centre Informatique National 
de I'Enseignement Superieur (CINES). Montpellier, France (ftp://ftp.cines.fr/IMGT/), at the European 
Biotnformatics Institute (EBI), Hinxton, UK (ftp://ftp.ebi.ac.uk/pub/databases/imgt/) and at the Institut de 
Genetique Humaine (IGH), Montpellier, France (ftp://ftp.igh.cnrs.fr/pub/IMGT), and from many SRS 
(Sequence Retrieval System) sites (IMGT other accesses>SRS). IMGT/LIGM-DB releases are produced 
weekly. Users can compare their own sequences against IMGT/LIGM-DB data using BLAST or FASTA 
on different servers {EBI, IGH, INFOBIOGEN, Institut Pasteur Paris, etc.) (IMGT other 
accesses>Compare your sequence against IMGT (BLAST, FASTA)). 

IMGT/MHC-DB, hosted on the EBI server at Hinxton (UK), comprises a database of the human MHC 
allele sequences (IMGT/MHC-HLA, developed by Cancer Research, UK and Anthony Nolan Research 
Institute (ANRI), London, UK, on the Web since December 1998 (1,646 entries in August 2003) 
[Robinson ef a/., 2000; 2003], databases of the MHC class II sequences from non-human primates 
(IMGT/MHC-NHP, curated by the Biomedical Primate Research Centre (BPRC), Rijswijk, The 
Netherlands) and from felines and canines (IMGT/MHC-FLA and IMGT/MHC-DLA, curated by the Centre 
for Integrated Genomic Medical Research, Manchester, UK), on the Web since April 2002 [Robinson ef 
a/., 2003]. 

IMGT/PRIMER-DB is the IMGT oligonucleotide (primer) database for the IG and TR, created by LIGM 
(Montpellier, France) in collaboration with EUROGENTEC S.A. (Seraing, Belgium), on the Web since 
February 2002. The IG and TR primers are useful for the analysis of the IG and TR gene repertoire and 
expression, the detection of minimal residual diseases in B and T cell malignancies, the construction of 
antibody combinatorial libraries, scFv, phage display or microarray technologies. In November 2003, 
IMGT/PRIMER-DB contained 1507 entries from Homo sapiens and Mus musculus. 

IMGT/PRIMER-DB contains information on primers and combinations of primers described as 
"sets" [primers sharing identical properties (species, group and orientation)] and "couples" [sets of 
opposite orientation for which IMGT/LIGM-DB sequences are known (or expected)]. Primers, Sets and 
Couples are described in IMGT Primer cards. IMGT Set cards and IMGT Couple cards, respectively. An 
IMGT Primer is an oligonucleotide described by comparison to an IMGT/LIGM-DB reference sequence, 
according to the standardized rules of the IMGT Scientific chart, based on IMGT-ONTOLOGY [Giudicelli 
and Lefranc, 1999], Taxonomy species and IMGT classification (group, subgroup, gene, allele) of a 
primer are those of the IMGT/LIGM-DB reference sequence, and not those of the PCR amplification 
products. This provides the following advantages for the data standardization: IMGT/PRIMER-DB primer 
definition, classification and description are independent from the experimental conditions, from DNA 
sources and from the different combinations (sets and couples) in which the primer can be used. That 
means that (i) the specificity of a primer (subgroup, gene or allele specific) which is either described 
experimentally or deduced from sequence comparison is not used for the primer classification, although 
these data are provided in the IMGT/PRIMER-DB Primer card in "Classification comments and 
specificity", (ii) the sequences resulting from the PCR amplifications are uniquely associated to the 
couples. The IMGT Primer cards are linked to IMGT/LIGM-DB flat files, IMGT Colliers de Perles and 
IMGT Repertoire>AIignments of alleles of the IMGT/LIGM-DB reference sequence used for the primer 
description. 



IMGT genome database: IMGT/GENE-DB 

IMGT/GENE-DB is the comprehensive IMGT genome database for IG and TR genes from human and 
mouse, and in development, from other vertebrates, created by LIGM, on the Web since January 2003. 
IMGT/GENE-DB annotated data are extracted from IMGT Repertoire, the global ImMunoGeneTics Web 
Resource, and from the IMGT/LIGM-DB database. All the human IMGT gene names [Lefranc and 
Lefranc, 2001a; 2001b] were approved by HGNC in 1999 [Wain ef a/., 2002], and entered in GDB 
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(Canada), LocusLink, NCBI (USA), and GeneCards. In August 2003, IMGT/GENE-DB contained 1,375 
genes and 2,201 alleles (673 IG and TR genes and 1,024 alleles from Homo sapiens, and 702 IG and TR 
genes and 1,177 alleles from Mus musculus, Mus cookii, Mus pahari, Mus spretus, Mus saxicola, Mus 
minutoides). 

IMGT/GENE-DB allows a search of IG and TR gene entries by locus, group, subgroup, based on the 
CLASSIFICATION concept of IMGT-ONTOLOGY [Giudicelli and Lefranc, 1999]. A short cut allows to 
search genes by a selection on gene name (according to the IMGT nomenclature) or on clone name(s) 
(data from the "Reference sequences" and "Sequences from the literature" columns in IMGT Repertoire^ 
Gene tables ). The selection is displayed at the top of the resulting genes page. The users can select the 
genes to view their detailed entries. Each IMGT/GENE-DB entry corresponds to one gene and provides, 
for each gene, the chromosomal localization, the gene name and definition, the number of alleles, links to 
IMGT Repertoire and to externa! sequence databases (EMBL, GenBank, DDBJ), genome databases 
(GDB, LocusLink, OMIM), and nomenclature database (HGNC Genew). Reciprocally, LocusLink, GDB, 
GeneCards and HGNC Genew have direct links to the IMGT/GENE-DB entries. 

IMGT/GENE-DB provides for each allele, the functionality, the clone names, the IMGT/LIGM-DB 
reference sequence accession numbers (with link to the flat files), and the "IMGT/GENE-DB reference 
sequences in FASTA format" (nucleotide and amino acid sequences of the coding regions extracted from 
the IMGT/LIGM-DB reference sequence), with gaps according to the IMGT unique numbering [Lefranc et 
ai, 2003] (based on the IMGT Scientific chart rules and on the NUMEROTATION concept of IMGT- 
ONTOLOGY). 



IMGT 3D structure database: IMGT/3Dstructure-DB 

IMGT/3Dstructure-DB is the IMGT 3D structure database for IG, TR, MHC and RPI of human and other 
vertebrate species, created by LIGM, on the Web since November 2001 [Ruiz and Lefranc, 2002; Kaas 
and Lefranc, 2002], lMGT/3Dstructure-DB comprises IG, TR, MHC and RPI with known 3D structures. In 
August 2003, the I M GT/3 Dstr u ctu re-DB database managed 634 coordinate files which correspond to 422 
different proteins (260 IG, 18 TR and 144 MHC). 

Coordinate files are extracted from the Protein Data Bank PDB (Berman et ai 2000), and IMGT 
annotations are added according to the IMGT Scientific chart rules, based on the IMGT-ONTOLOGY 
concepts [Giudicelli and Lefranc, 1999]. An I M GT/3 D str uctu re- D B card provides IMGT gene and allele 
identification (based on the CLASSIFICATION concept), domain delimitations (based on the 
DESCRIPTION concept), amino acid positions according to the IMGT unique numbering [Lefranc et a/., 
2003] (based on the NUMEROTATION concept). Domains that are analysed in !MGT/3Dstructure-DB 
include V-DOMAIN (variable) and C-DOMAIN (constant) found in IG and TR, V-LIKE and C-LIKE- 
DOMAINs found in proteins other than IG and TR, G-DOMAIN (groove) found in MHC, and G-LIKE : 
DOMAINS found in proteins other than MHC [Duprat ef a/., 2003]. Moreover, IMGT/3Dstructure-DB 
provides renumbered coordinate flat files, Collier de Perles (standard and two-layer 2D graphical 
representations) (Ruiz and Lefranc. 2002; Lefranc era/., 2003], and results of contact analysis. The IMGT 
unique numbering and gene standardization will provide a great help in large scale sequence-structure 
studies and more generally in protein engineering. 



IMGT interactiv tool 

IMGT interactive immunoinformatics tools rely on the DESCRIPTION and NUMEROTATION concepts of 
IMGT-ONTOLOGY, and include sequence, genome and 3D structure analysis tools. 
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IMGT tools for sequence analysis 

IMGTA/-QUEST (V-QUEry and STandardization) is an integrated software for IG and TR [Lefranc, 2001a; 
2003b]. This tool, easy to use, analyses an input IG or TR germline or rearranged variable nucleotide 
sequence. IMGT A/-QU EST results comprise the identification of the V, D and J genes and alleles and the 
nucleotide alignments by comparison with sequences from the IMGT reference directory, the 
delimitations of the FR-IMGT and CDR-IMGT based on the IMGT unique numbering, the protein 
translation of the input sequence, the identification of the JUNCTION, and the two-dimensional Collier de 
Pedes representation of the V-REGION. IMGT/V-QUEST is particularly useful for the analysis of the 
rearranged variable genes: it allows to identify the V-GENE and J-GENE and alleles involved in the IG 
and TR rearrangements, and to delimit the JUNCTION. Searches can be done related to IG and TR of 
human and mouse, and of other species (non-human primates, sheep, teleostei and chondrichthyes). 
IMGT/V-QUEST can also be used for the analysis of functional or ORF germline variable genes from 
other species, to delimit the FR-IMGT and CDR-IMGT, provided that the similarity with sequences of the 
IMGT/V-QUEST reference directory sets is sufficiently high. The sets of sequences from the IMGT 
reference directory, used for IMGT/V-QUEST, can be downloaded in FASTA format from the IMGT site. 

IMGT/JunctionAnalysis is a tool, complementary to IMGT/V-QUEST, which provides a thorough analysis 
of the V-J and V-D-J junction of IG and TR rearranged genes. IMGT/JunctionAnalysis identifies the D- 
GENEs and alleles involved in the IGH, TRB and TRD V-D-J rearrangements by comparison with the 
IMGT reference directory, and delimits precisely the P, N and D regions (IMGT/JunctionAnalysis output 
results). Results from IMGT/JunctionAnalysis are more accurate than those given by IMGT/V-QUEST 
regarding the D-GENE identification. Indeed, IMGT/JunctionAnalysis works on shorter sequences 
(JUNCTION), and with a higher constraint since the identification of the V-GENE and J-GENE and alleles 
is a prerequisite to perform the analysis. Several hundreds of junction sequences can be analysed 
simultaneously. 

IMGT/Phylogene is an easy to use tool for phylogenetic analysis of variable region (V-REGION) and 
constant domain (C-DOMAIN) sequences. This tool is particularly useful in developmental and 
comparative immunology. The users can analyse their own sequences by comparison with the IMGT 
standardized reference sequences for human and mouse IG and TR [Elemento and Lefranc, 2003]. 

IMGT/Allele-Altgn allows the comparison of two alleles highlighting the nucleotide and amino acid 
differences. 



IMGT tools for genome analysis 

IMGT/GeneSearch, IMGT/GeneView and IMGT/LocusView are tools which provide the display of 
physical maps for the human IG, TR and MHC loci. The mouse TRA/TRD locus is also available. 



IMGT tool for 3D structure analysis 

IMGT/StructuraiQuery is a tool which allows to retrieve the IMGT/3Dstructure-DB entries, based on 
specific structural characteristics: phi and psi angles, accessible surface area ASA, amino acid type, 
distance in angstrom between amino acids, CDR-IMGT lengths [Kaas and Lefranc, 2003]. 
IMGT/StructuraiQuery is currently available for the V-DOMAINs. 



IMGT Web r sourc s ("IMGT Mari -Paul page") 
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IMGT Web resources ("IMGT Marie-Paule page") [Lefranc, 2003a] consist of 8000 HTML pages and 
comprise the following sections: "IMGT Scientific chart", "IMGT Repertoire", "IMGT Bloc-notes", "IMGT 
Education" and "IMGT Index". 



IMGT Scientific chart 

IMGT Scientific chart provides the controlled vocabulary and the annotation rules and concepts defined 
by IMGT-ONTOLOGY [Giudicelli and Lefranc, 1999] for the identification, the description, the 
classification and the numerotation of the IG, TR, MHC and RPI data of human and other vertebrates. 
The IMGT Scientific chart rules are described in the corresponding sections: Sequence and 3D structure 
identification and description, Nomenclature, and Numbering. 

IMGT Repertoire 

IMGT Repertoire is the global Web Resource in ImMunoGeneTics for the IG, TR, MHC and RPI of human 
and other vertebrates, based on the IMGT Scientific chart. IMGT Repertoire provides an easy-to-use 
interface to carefully and expertly annotated data on the genome, proteome, polymorphism and structural 
data, organized in three major sections: IMGT Repertoire (IG and TR), IMGT Repertoire (MHC) and 
IMGT Repertoire (RPI) [Lefranc, 2001a]. Only titles of this large resource are quoted here, with links as 
examples, to IMGT Repertoire (IG and TR). Genome data ("Locus and genes") include chromosomal 
localizations, locus representations, locus description, gene tables, potential germline repertoires, lists of 
IG and TR genes and links between IMGT, HUGO, GDB, LocusLink and OMIM, correspondence 
between nomenclatures [Lefranc and Lefranc, 2001a; 2001b], references sequences [Barbie and Lefranc, 
1998; Pallares et a/., 1998; 1999; Ruiz et ah, 1999; Foich and Lefranc, 2000a; 2000b; Scaviner and 
Lefranc, 2000a; 2000b; Martinez -Jean et al. t 2001; Bosc and Lefranc, 2003]. Proteome and 
polymorphism data ("Proteins and alleles") are represented by protein displays which show translated 
sequences of the allele *01 of each functional or ORF gene [Lefranc and Lefranc, 2001a; 2001b; 
Scaviner etaf., 1999; Folch et a/., 2000], alignments of alleles, tables of alleles, allotypes, particularities in 
protein designations, IMGT reference directory in FASTA format, correspondence between IG and TR 
chain and receptor IMGT designations [Lefranc and Lefranc, 2001a; 2001b]. Structural data ("2D and 3D 
structures") comprise 2D graphical representations designated as Colliers de Perles [Lefranc ef a/., 1998; 
1999], FR-IMGT and CDR-IMGT lengths, and 3D representations of IG and TR variable domains [Ruiz et 
a/., 2002; Lefranc ef a/., 2003]. This visualization permits rapid correlation between protein sequences 
and 3D data retrieved from the Protein Data Bank (PDB). Other data comprise: Probes and RFLP with 
phages, probes used for the analysis of IG and TR gene rearrangements and expression, and Restriction 
Fragment Length Polymorphism (RFLP) studies, Taxonomy of vertebrate species present in IMGT/LIGM- 
DB, Gene regulation and expression with data on promoters, primers, cDNAs, reagent monoclonal 
antibodies, and Genes and clinical entities: translocations and inversions, humanized antibodies, 
monoclonal antibodies with clinical indications. 



IMGT Bloc-notes 

IMGT Bloc-notes is a selection of useful links for immunoinformatics, immunogenetics, immunology, 
genetics, molecular biology and bioinformatics. IMGT Bloc-notes is organized in several sections. The 
IMGT immunoinformatics page comprises links to databases, tools and resources on IG, TR, MHC and 
RPI. Interesting links provide numerous hyperlinks towards the Web servers specializing in immunology, 
genetics, molecular biology and bioinformatics (associations, biopharmaceuticals, collections, companies, 
databases, immunology themes, journals, molecular biology servers, resources, societies, tools, etc.) 
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[Lefranc, 2000e]. Other sections are meeting announcements, postdoctoral positions, etc. 



IMGT Education 

IMGT Education is a section which provides useful biological resources for students. It includes IMGT 
Aide-memoire which provides an easy access to information such as genetic code, splicing sites, amino 
acid structures, restriction enzyme sites, etc., Questions and answers, Tutorials (in English and/or in 
French) on 3D structure, immunoglobulins and B cells, T cell receptors and T cells, NK receptors, 
pathologies of the immune system, cancer, AIDS, etc. 

IMGT Index 

IMGT Index is a referential index which provides a fast way to access data when information has to be 
retrieved from different parts of the IMGT site. For example, "allele" provides links to the IMGT Scientific 
chart rules for the allele description, and to the IMGT Repertoire Alignments of alleles and Tables of 
alleles. 

.s 



Conclusion 

Since July 1995, IMGT has been available on the Web at http://imgt.cines.fr. IMGT provides the biologists 
with an easy to use and friendly interface. Since January 2000, the IMGT www server at Montpellier has 
been accessed by more than 250,000 sites. IMGT has an exceptional response with more than 120,000 
requests a month. Two thirds of the visitors are equally distributed between the European Union and the 
United States. To facilitate the integration of IMGT data into applications developed by other laboratories, 
we have built an Application Programming Interface (API) to access the database [Giudicelli et a/., 
1998a]. This API includes: a set of URL links to access biological knowledge data (keywords, labels, 
functionalities, list of gene names, etc.), a set of URL links to access all data related to one given 
sequence. To increase interoperability with other ontologies and information systems, IMGT-ONTOLOGY 
is currently being written using XML (Extensible Markup Language) approach, in IMGT-ML [Chaume et 
ai, 2001; 2003]. By making data portable, XML is useful both internally for the integration of data and 
\ externally for sharing data with other information systems. Because of this data integration ability, XML 
has become the underpinning for Web-related computing. IMGT-ML defines XML schemas to encode 
data with XML tags respecting the IMGT-ONTOLOGY concepts. IMGT-ML schemas will be used for 
distributive data using the Web-services technology. IMGT distributes high quality data with an important 
incremental value added by the IMGT expert annotations, according to the rules described in the IMGT 
Scientific chart. Control of coherence in IMGT combines data integrity control and biological data 
evaluation [Giudicelli et a/., 1998a; 1998b]. The information provided by IMGT is of much value to 
clinicians and biological scientists in general [Lefranc, 2002; 2003b; Chardes et a/., 2002]. 
IMGT/PROTEIN-DB, a protein database for IG and TR, will contain translations of potentially functional 
and ORF sequences from IMGT/LIGM-DB, and protein data from Kabat [Kabat et a/., 1991] and PDB. 
IMGT is designed to allow a common access to all immunogenetics data, and particular attention is given 
to the establishment of cross-referencing links to other databases pertinent to the users of IMGT. 



Citing IMGT 

If you use IMGT databases, tools and/or Web resources, please cite [Lefranc, 2003a], and this paper as 
references, and quote the IMGT Home page URL address, http://imgt.cines.fr. 
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IMGT, the international ImMunoGeneTics database (http://imgtxines.fr) 
high quality integrated information system specializing in immunoglobul 
(IG), T cell receptors (TR) and major histocompatibility complex (MHC) 
human and other vertebrates. IMGT provides a common access to expertl 
annotated data on the genome, proteome, genetics and structure of the IG 
TR, based on the IMGT Scientific chart and IMGT-ONTOLOGY. The IM 
unique numbering defined for the IG and TR variable regions and domain 
all jawed vertebrates has allowed a redefinition of the limits of the framew 
(FR-IMGT) and complementarity determining regions (CDR-IMGT), 
leading, for the first time, to a standardized description of mutations, alle 
polymorphisms, 2D representations (Colliers de Pedes) and 3D structures 
whatever the antigen receptor, the chain type, or the species. The IMGT 
numbering has been extended to the V-like domain and is, therefore, high 
valuable for comparative analysis and evolution studies of proteins belon 
to the IG superfamily. 
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